Duration modification using glottal closure instants and vowel onset points

نویسندگان

  • K. Sreenivasa Rao
  • Bayya Yegnanarayana
چکیده

This paper proposes a method for duration (time scale) modification using Glottal Closure Instants (GCI, also known as instants of significant excitation) and Vowel Onset Points (VOP). In general, most of the time scale modification methods attempt to vary the duration of speech segments uniformly over all regions. But it is observed that consonant regions and transition regions between a consonant and the following vowel, and between two consonant regions do not vary appreciably with speaking rate. The proposed method implements the duration modification without changing the durations of the transition and consonant regions. Vowel onset points are used to identify the transition and consonant regions. A VOP is the instant at which the onset of the vowel takes place, which corresponds to the transition from a consonant to the following vowel in most cases. The VOPs are computed using the Hilbert envelope of Linear Prediction (LP) residual. The instants of significant excitation correspond to the instants of glottal closure (epochs) in the case of voiced speech, and to some random excitations, like the onset of burst, in the case of nonvoiced speech. Manipulation of duration is achieved by modifying the duration of the LP residual with the help of instants of significant excitation as pitch markers. The modified residual is used to excite the time-varying filter whose parameters are derived from the original speech signal. Perceptual quality of the synthesized speech is found to be natural. Performance of the proposed method is compared with the method, where the duration of speech is modified uniformly over all regions. Samples of speech signals for different modification factors is available for listening at http://sit.iitkgp.ernet.in/∼ksrao/result.html K. Sreenivasa Rao is with the School of Information Technology, Indian Institute of Technology Kharagpur, Kharagpur 721302, West Bengal, India. E-mail: [email protected] B. Yegnanarayana is with the International Institute of Information Technology (IIIT), Gachibowli, Hyderabad 500032, Andhra Pradesh, India. Email: [email protected]

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Vowel onset point detection for noisy speech using spectral energy at formant frequencies

In this paper, we propose a method for robust detection of the vowel onset points (VOPs) from noisy speech. The proposed VOP detection method exploits the spectral energy at formant frequencies of the speech segments present in glottal closure region. In this work, formants are extracted by using group delay function, and glottal closure instants are extracted by using zero frequency filter bas...

متن کامل

Prosodic manipulation using instants of significant excitation

This paper proposes a technique for prosodic (pitch and duration) manipulation using instants of significant excitation. Instants of significant excitation correspond to the instants of glottal closure (epochs) in voiced speech and to some random excitations like burst onset in the case of nonvoiced speech. Instants of significant excitation are computed from the average group delay of minimum ...

متن کامل

Significance of instants of significant excitation for source modeling

The objective of this work is to demonstrate the significance of instants of significant excitation for source modeling. Instants of significant excitation correspond to the glottal closure, glottal opening, onset of burst, frication and a small number of excitation instants around them. The speech signal is processed independently by zero frequency filtering (ZFF) to obtain epochs. The epochs ...

متن کامل

Emotion conversion using Feedforward Neural Networks

An emotion is made of several components such as physiological changes in the body, subjective feelings, and expressive behaviours. These changes in speech signal are mainly observed in prosody parameters such as pitch, duration and energy. In this work, prosody parameters are modified using instants of significant excitation (epochs) and these instants are detected using Zero Frequency Filteri...

متن کامل

Automatic pitch marking and reconstruction of glottal closure instants from noisy and deformed electro-glotto-graph signals

Pitch tracking and pitch marking (PM) are two important speech signal analysis techniques for several applications. The accuracy of both pitch marking and tracking is significant to generate smooth synthesized speech by controlling the pitch and duration of voiced speech in Text-to-Speech (TTS) system for example. In this paper, we present a novel hybrid approach, combining electro-glotto-graph...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Speech Communication

دوره 51  شماره 

صفحات  -

تاریخ انتشار 2009